It seems that we Belgians just love confusing foreigners…
Imagine wanting to take a train to Mons from Ghent but there only is one driving to Bergen. Or driving south with a GPS telling you to folloz the direcion of Liège, but as long as you are on Flemish territory the direction signs will indicate Luik instead.

Mons/Bergen, Liège/Luik, Ypres/Ieper… those names refer to exactly the same city - one of them is the official French name, the other one the official Dutch one.

Two week ago, I heard again a story from foreigners who got very confused, and I realized I have no idea how many towns/cities we have like this. Sounds like a perfect time to work on my spatial object skills!

Data source

I found everything I needed on this website from the Belgian government. The data is from early 2017.

Cleaning the data

Starting by loading the packages needed:

#packages for the data exploration
library(tidyverse)
library(readxl)
library(ggplot2)

#packages for the maps
library(sp)
library(tmap)
library(viridisLite)
library(leaflet)
library(BelgiumMaps.StatBel)

Importing the data:

#Importing the data
raw_data <- read_excel("TF_SOC_POP_STRUCT_2017_tcm325-283761.xlsx", sheet=1)

The data contains a lot of unneeded administrative data, and I wanted to rename some columns to English.

#Keeping only the variables needed
data <- raw_data %>% 
  select(contains("MUNTY"), TX_RGN_DESCR_NL, CD_SEX, TX_NATLTY_NL, TX_CIV_STS_NL, CD_AGE, MS_POPULATION)
colnames(data) <- c("REFNIS", "TownNL", "TownFR", "Region", "Sex", "Nationality", "MaritalStatus", "Age", "Population")

#Translating Region names to English
data$Region <- data$Region %>% 
  str_replace("Vlaams Gewest", "Flanders") %>% 
  str_replace("Waals Gewest", "Wallonia") %>% 
  str_replace("Brussels Hoofdstedelijk Gewest", "Brussels agglomeration")

Additionnally, the data does not have one population count but is divided in demographic subsets.
If I would want to know how many people live in my home town, which have the same gender, age, nationality and marital status, I could (26 by the way). But since that’s not really what I’m after, I used dplyr to create a summary population table, and immediately added a new boolean column to compare Town Names in Flemish and French.

#Creating a dataframe with total population for each town, and adding a column to see whether they have the same name
popdata1 <- data %>% 
  group_by(TownNL, TownFR, Region, REFNIS) %>% 
  summarise(population=sum(Population)) %>% 
  arrange(desc(population)) %>%
  mutate(SameName = TownNL==TownFR) %>% 
  ungroup()

Quite quickly an issue presented itself though: while browsing through some breakouts, I noticed that some town names are annotated with their district. Beveren for instance is called the same in Flemish or French, but its district got translated, and it was flagged as town with a different name in Flemish or French.

#Noticing an issue: 
popdata1%>%
  filter(Region=="Flanders") %>% 
  filter(!SameName) %>% 
  slice (11:13)
## # A tibble: 3 x 6
##                   TownNL                  TownFR   Region REFNIS
##                    <chr>                   <chr>    <chr>  <chr>
## 1 Beveren (Sint-Niklaas) Beveren (Saint-Nicolas) Flanders  46003
## 2            Dendermonde                Termonde Flanders  42006
## 3              Vilvoorde                Vilvorde Flanders  23088
## # ... with 2 more variables: population <dbl>, SameName <lgl>

To get rid of the districts, I cleaned out any word pattern between brackets, and re-generated a boolean column DiffName to see whether the town names are different.

#Removing the sectors between brackets
popdata <- popdata1
popdata$TownNL <- str_replace(popdata$TownNL, pattern="\\s\\(.+\\)", replacement="")
popdata$TownFR <- str_replace(popdata$TownFR, pattern="\\s\\(.+\\)", replacement="")

#Reassessing whether the names are the same
popdata <- popdata %>% 
  mutate(DiffName = TownNL != TownFR) %>%
  select(TownNL, TownFR, DiffName, population, Region, REFNIS)



A glimpse of the data exploration

There are 95 towns/cities with two different official names, which is 0.1612903 of the total amount of towns. Contrary to what some people assume, it’s more or less similar in both regions: 13% of Flemish towns have an official French name, 16% of Walloon towns have an official Flemish name on top. Only in Brussels, an official bilingual region, as a much higher percentage of ’double name’s.

#How many have exactly the same name?

popdata %>% 
  summarise(NTowns_DiffName = sum(popdata$DiffName), Prop_DiffName=mean(popdata$DiffName))%>%
  knitr::kable()
NTowns_DiffName Prop_DiffName
95 0.1612903
#by region
popdata %>% 
  group_by(Region) %>% 
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
           Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))%>%
  knitr::kable()
Region NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
Brussels agglomeration 19 6 13 0.32 0.68
Flanders 308 269 39 0.87 0.13
Wallonia 262 219 43 0.84 0.16

Mapping the towns with two official names

Using tmap I created two first maps: one that shows the general regions in Belgium, and a second comparative one highlighting just the towns that have two official town names.

#Importing SPdataframe for Belgium
data("BE_ADMIN_MUNTY", package="BelgiumMaps.StatBel")

#creating a Region2 for making the second plot highlighting only DiffName towns
popdatamap <- popdata %>%
  mutate(Region2 = ifelse(DiffName==TRUE, Region, NA))

#Merging my 2017 data with the SPdataframe
mapdata <- merge(BE_ADMIN_MUNTY, popdatamap, by.x = "CD_MUNTY_REFNIS", by.y = "REFNIS")



#palette generation
virpalette <- rev(viridis(3))

#Plot different regions
regionplot<- tm_shape(mapdata) +
  tm_fill(col="Region", palette=virpalette,
          title = "Regions in Belgium")+
  tm_polygons()+
  tm_layout(legend.position = c("left", "bottom"))


#Plot to show those with differnet name by region
nameplot <- tm_shape(mapdata) +
  tm_fill(col="Region2", palette=virpalette, 
          colorNA = "gray90", textNA="Same name", 
          title = "Different regional town names",legend.position = c("left", "bottom" ))+
  tm_polygons()+
  tm_layout(legend.position = c("left", "bottom"))

#Show both plots next to each other
tmap_arrange(regionplot, nameplot)

First of all, for people not familair with Belgium: you see our basic regions in the left plot * The yellow dot in the middle is the Brussels agglomeration, officially bilingual * The north in green is Flanders where the official language is Dutch (of Flemish as we call it) * The south in purple is Wallonia where the official language is French * The divide between green en purple is called the language border… * To make things even more complicated, some towns in Flanders or Wallonia have a special status: they have “language facilities”. To make something complicated very simple: they are bilingual without being bilingual.

The image on the right just shows all the towns with two official town names. Seeing a higher concentration of these towns around the language border is not a complete surprise, but it does not explain the majority of towns.

## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

Distilling the reason for two official town names

Reason 1: Brussels, an official Bilingual region

In the above table it was obvious that the Brussel’s region has a much higher share of towns with two offical names: 68% versus the country average of 16%. Given Brussels status as bilingual that should not come as a surprise. I was actually more surprised to realize that there are still 6 that only have their original name only. Ganshoren for instance is a typical Flemish name that is not that easy to pronounce in French.

#Checking the data on Brussels
popdata %>% 
  filter(Region=="Brussels agglomeration") %>% 
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
            Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))%>%
  knitr::kable()
NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
19 6 13 0.32 0.68
#List of names for Brussels
popdata %>% 
  filter(Region=="Brussels agglomeration") %>% 
  group_by(DiffName) %>%
  arrange(desc(DiffName), desc(population)) %>%
  knitr::kable()
TownNL TownFR DiffName population Region REFNIS
Brussel Bruxelles TRUE 176545 Brussels agglomeration 21004
Schaarbeek Schaerbeek TRUE 133042 Brussels agglomeration 21015
Sint-Jans-Molenbeek Molenbeek-Saint-Jean TRUE 96629 Brussels agglomeration 21012
Elsene Ixelles TRUE 86244 Brussels agglomeration 21009
Ukkel Uccle TRUE 82307 Brussels agglomeration 21016
Vorst Forest TRUE 55746 Brussels agglomeration 21007
Sint-Lambrechts-Woluwe Woluwe-Saint-Lambert TRUE 55216 Brussels agglomeration 21018
Sint-Gillis Saint-Gilles TRUE 50471 Brussels agglomeration 21013
Sint-Pieters-Woluwe Woluwe-Saint-Pierre TRUE 41217 Brussels agglomeration 21019
Oudergem Auderghem TRUE 33313 Brussels agglomeration 21002
Sint-Joost-ten-Node Saint-Josse-ten-Noode TRUE 27115 Brussels agglomeration 21014
Watermaal-Bosvoorde Watermael-Boitsfort TRUE 24871 Brussels agglomeration 21017
Sint-Agatha-Berchem Berchem-Sainte-Agathe TRUE 24701 Brussels agglomeration 21003
Anderlecht Anderlecht FALSE 118241 Brussels agglomeration 21001
Jette Jette FALSE 51933 Brussels agglomeration 21010
Etterbeek Etterbeek FALSE 47414 Brussels agglomeration 21005
Evere Evere FALSE 40394 Brussels agglomeration 21006
Ganshoren Ganshoren FALSE 24596 Brussels agglomeration 21008
Koekelberg Koekelberg FALSE 21609 Brussels agglomeration 21011
#Adding a column to note down the reason for different names
reason_BXL <- popdata %>% 
  filter(Region=="Brussels agglomeration") %>% 
  filter(DiffName) %>%
  mutate(Reason = "Brussels")

Reason 2: Larger cities

Cities are generally more important and I would have guessed that most of our cities have two official names. By just looking at the difference in average population between towns that have two names (DiffName==TRUE) and those who don’t, there clearly is a skew towards higher population town.
A quick plot in ggplot confirms this to be true: grey shows all the towns in Belgium according to their population size on a logarithmic scale. I coloured those who have two names in green.

popdata %>%
  group_by(DiffName) %>% 
  summarise(mean=mean(population), median=median(population))
## # A tibble: 2 x 3
##   DiffName     mean median
##      <lgl>    <dbl>  <dbl>
## 1    FALSE 14744.06  11383
## 2     TRUE 42510.78  24701
#Plotting average town size of small and larger towns
ggplot()+
  geom_histogram(data=popdata, aes(x=population), fill="grey", alpha=0.6)+
  geom_histogram(data=subset(popdata, DiffName==TRUE), aes(x=population), fill="cadetblue4", alpha=1)+
  scale_x_log10()+
  labs(x= "Population", y="Number of towns", title="Size of towns with two official names amongst all towns in Belgium")

I took a shortcut to define our cities: the 10% highest populated towns.

#10% largest towns and cities in Belgium
quantile(popdata$population, probs = seq(from = 0, to = 1, by = .1))
  0%      10%      20%      30%      40%      50%      60%      70% 
89.0   4372.2   6341.8   8308.4  10268.4  12123.0  14649.6  18473.6 
 80%      90%     100% 

23259.6 34189.8 520504.0

#Proportion of Cities with different names
popdata %>% 
  filter(population > 34000) %>%
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
            Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))%>%
  knitr::kable()
NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
60 27 33 0.45 0.55
#Adding a reason column 
reason_city <- popdata %>% 
  filter(population > 34000) %>%
  filter(Region != "Brussels agglomeration") %>% 
  filter(DiffName) %>% 
  mutate(Reason = "City")

Reason 3: German speaking region (and towns with German language facilities)

After World War I, the peace treaty of Versailles listed the annexation of 9 German towns into Belgium as war compensation. They make up our third language region as German is still their main language today.
Given that German and Dutch are both German langauges and have a lot of similarities it would make sense that the Flemish would refer to the German town names, while the French have changed some of them.

#Listing the German communes and the two additional towns with german facilities
germanspeaking <- c("Eupen", "Kelmis", "Lontzen", "Raeren", "Amel", "Büllingen", 
                    "Burg-Reuland", "Bütgenbach", "Sankt Vith", "Malmedy", "Weismes")

#Proportion of Cities with different names
popdata %>% 
  filter(TownNL %in% germanspeaking) %>%
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
            Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))%>%
  knitr::kable()
NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
11 5 6 0.45 0.55
#German towns with two official names
popdata %>% 
  filter(TownNL %in% germanspeaking) %>%
  filter(DiffName==TRUE) %>% 
  print(n=nrow(.))%>%
  knitr::kable()

A tibble: 6 x 6

  TownNL      TownFR DiffName population   Region REFNIS
   <chr>       <chr>    <lgl>      <dbl>    <chr>  <chr>

1 Kelmis La Calamine TRUE 10964 Wallonia 63040 2 Sankt Vith Saint-Vith TRUE 9661 Wallonia 63067 3 Weismes Waimes TRUE 7493 Wallonia 63080 4 Bütgenbach Butgenbach TRUE 5583 Wallonia 63013 5 Amel Amblève TRUE 5523 Wallonia 63001 6 Büllingen Bullange TRUE 5489 Wallonia 63012

TownNL TownFR DiffName population Region REFNIS
Kelmis La Calamine TRUE 10964 Wallonia 63040
Sankt Vith Saint-Vith TRUE 9661 Wallonia 63067
Weismes Waimes TRUE 7493 Wallonia 63080
Bütgenbach Butgenbach TRUE 5583 Wallonia 63013
Amel Amblève TRUE 5523 Wallonia 63001
Büllingen Bullange TRUE 5489 Wallonia 63012
#Adding a reason column 
reason_german <- popdata %>% 
  filter(TownNL %in% germanspeaking) %>%
  filter(DiffName) %>% 
  mutate(Reason = "German region")

Reason 4: Towns in Flanders or Wallonia with official language facilities

Always a topic for debate in Belgium: the towns with official language facilities. These are towns that belong to one region but they have some degree of bilingual facilities (it’s complicated!).

#Listing all towns with language facilities
faciliteiten <- c("Bever", "Drogenbos", "Herstappe", "Kraainem", "Linkebeek", 
                  "Mesen", "Ronse", "Sint-Genesius-Rode", "Spiere-Helkijn", 
                  "Voeren", "Wemmel", "Wezembeek-Oppem", "Edingen", 
                  "Komen-Waasten", "Moeskroen", "Vloesberg")

#Proportion of Cities with different names
popdata %>% 
  filter(TownNL %in% faciliteiten) %>%
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
            Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))%>%
  knitr::kable()
NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
16 6 10 0.38 0.62
#Which towns have different names?
popdata %>% 
  filter(TownNL %in% faciliteiten) %>%
  filter(DiffName==TRUE) %>% 
  print(n=nrow(.))%>%
  knitr::kable()

A tibble: 10 x 6

           TownNL             TownFR DiffName population   Region
            <chr>              <chr>    <lgl>      <dbl>    <chr>

1 Moeskroen Mouscron TRUE 57773 Wallonia 2 Ronse Renaix TRUE 26092 Flanders 3 Sint-Genesius-Rode Rhode-Saint-Genèse TRUE 18231 Flanders 4 Komen-Waasten Comines-Warneton TRUE 18102 Wallonia 5 Edingen Enghien TRUE 13563 Wallonia 6 Voeren Fourons TRUE 4129 Flanders 7 Vloesberg Flobecq TRUE 3426 Wallonia 8 Bever Biévène TRUE 2160 Flanders 9 Spiere-Helkijn Espierres-Helchin TRUE 2142 Flanders 10 Mesen Messines TRUE 1049 Flanders # … with 1 more variables: REFNIS

TownNL TownFR DiffName population Region REFNIS
Moeskroen Mouscron TRUE 57773 Wallonia 54007
Ronse Renaix TRUE 26092 Flanders 45041
Sint-Genesius-Rode Rhode-Saint-Genèse TRUE 18231 Flanders 23101
Komen-Waasten Comines-Warneton TRUE 18102 Wallonia 54010
Edingen Enghien TRUE 13563 Wallonia 55010
Voeren Fourons TRUE 4129 Flanders 73109
Vloesberg Flobecq TRUE 3426 Wallonia 51019
Bever Biévène TRUE 2160 Flanders 23009
Spiere-Helkijn Espierres-Helchin TRUE 2142 Flanders 34043
Mesen Messines TRUE 1049 Flanders 33016
#Adding a reason column
reason_facilities <- popdata %>% 
  filter(TownNL %in% faciliteiten) %>%
  filter(DiffName) %>% 
  anti_join(reason_city) %>% 
  mutate(Reason = "Language facilities")

Reason 5: Other reasons

The language border plays a big role obviously, so gathered any other towns along the border even if they have no language facilities. In many of these cases,throughout history towns have changed which region they belong to.

Lastly, I wanted to make an “other reason” category, and bind all the reasons to my main data.

#Language border

language_border <- c("Heuvelland", "Komen-Waasten", "Mesen", "Menen", "Kortrijk", "Moeskroen", "Spiere-Helkijn",
                     "Ronse", "Elzele", "Vloesberg", "Lessen", "Geraardsbergen", "Bever", "Opzullik",
                     "Edingen", "Rebecq", "Tubeke", "Kasteelbrakel", "Halle", "Sint-Genesius-Rode", 
                     "Eigenbrakel", "Terhulpen", "Waver", "Graven", "Bevekom", "Geldenaken", "Tienen", "Lijsem", 
                     "Hannuit", "Borgworm", "Oerle", "Tongeren", "Bitsingen", "Voeren", "Wezet")


reason_langborder <- popdata %>% 
  filter(TownNL %in% language_border) %>%
  filter(DiffName) %>% 
  anti_join(reason_city) %>% 
  anti_join(reason_facilities) %>% 
  mutate(Reason = "Language border")


#Other
reason_other <- popdata %>% 
  filter(DiffName) %>% 
  anti_join(reason_city) %>% 
  anti_join(reason_BXL) %>% 
  anti_join(reason_german) %>% 
  anti_join(reason_facilities) %>% 
  anti_join(reason_langborder) %>%
  mutate(Reason = "Other")



#Merging reasons
reason <- bind_rows(reason_BXL, reason_city, reason_german, reason_facilities, reason_langborder, reason_other)

#Searching for duplicates before join
reason %>% 
  group_by(REFNIS) %>% 
  filter(n() > 1)


#Joining
popdata_reason <- left_join(popdata, reason)
popdata_reason <- popdata_reason %>%
  mutate(Region2 = ifelse(DiffName==TRUE, Region, NA))

A quick reason map:
The Brussels and German region are pretty obvious dots in the map, along with the language border and facility towns. Large cities are scattered across the whole of Belgium and many of the unidentied scattered dots also represent smaller cities (like Aarlen/Arlon or Temse/Tamise).
There is another group of towns close together that have two names even if

Making a final map

I wanted to bring it all together in one final interactive map:

## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).